Optimization by Hendrik-code · Pull Request #111 · Hendrik-code/TPTBox

Hendrik-code · 2026-06-11T18:24:24Z

No description provided.

…any) cc3d statistics also computes centroids+bboxes that np_volume discards. For many-label arrays (e.g. connected-component maps) np.bincount is far cheaper: ~37x faster on a 64k-component map and ~3x on ~400 labels, while cc3d stays faster for the few-label anatomical-segmentation case. Switch on a cheap arr.max()>256 check to get best-of-both. Verified equal across dtypes / label counts / include_zero; speedtest_volume.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

For 3D arrays, two of the three axis extents are derived from a single shared 2D projection (np.any over the contiguous last axis), so only one extra full reduction is needed. ~13-18% faster across 256^3/512^3 and px_dist values; identical slices (verified vs old impl on 2D+3D, incl. empty handling). Generic n-D path unchanged. speedtest_bbox_binary.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ding_boxes np_center_of_mass and np_bounding_boxes built a `unique` list then filtered with `idx in unique` (O(max_label x n_unique)). Check voxel_counts[idx] directly instead. ~2.2x faster at ~4k labels, ~1.3x at ~2k, unchanged for few labels; identical output (use_crop preserved). speedtest_center_of_mass.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…+mask The per-label per-iteration 'data = out.copy(); data[i != data] = 0' was a full array copy plus a masked write. _binary_dilation casts its input to bool anyway, so 'data = out == i' is bit-exact (verified across 6480 configs) and skips the copy. ~11-18% faster across n_pixel/connectivity on few-label 150^3 segs. The n_pixel loop is kept (it encodes iterative inter-label competition that a single larger-kernel pass would not reproduce). speedtest_dilate_vectorized.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

np.isin is 3-6x slower than a boolean lookup table for multi-label membership on uint segmentation masks. New np_isin() builds lut[labels]=True and gathers lut[arr] for unsigned arrays with a small label range; it special-cases the single-label (arr==label) case and falls back to np.isin for signed/negative/ huge-range inputs. Verified equal to np.isin across 132 dtype/label/invert cases. Applied at the 9 multi-label np.isin sites (extract_label, erode/dilate (+euclid), connected_components, filter_connected_components). numpy's own kind='table' did not help. speedtest_isin_lut.py added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The keep_label path called get_seg_array() twice (two full array copies) plus np_extract_label and a multiply. Now it takes a single copy and zeros voxels not in the label set via np_isin, keeping original label values. ~1.74x faster on a 300^3 mask; output identical (verified scalar+list, keep+binary paths). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The per-label 'seg_arr[seg_arr == l] = 0' loop costs one full pass per removed label (linear in label count). A single np_map_labels gather ({label: fill}) is constant-time: tied with the loop for a few labels, ~2.2x faster at 20 labels (sparse) and ~6x on dense masks. Enums are now resolved to .value like extract_label does (the int path is unchanged). Verified equal across scalar/list/nested labels and removed_to_label values. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… verbose The two np_unique full-array scans only feed the verbose log line. Guarding them behind 'verbose' makes the common in-loop verbose=False path ~5x faster on a 300^3 mask. The verbose=True output is unchanged; the returned data is identical either way. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

It called extract_label(...).get_seg_array() twice (each a full round-trip: copy + np_extract_label + NII construction + copy) just to get two binary masks. Both masks now come directly from the single get_array() via np_isin. ~2.15x faster on a 300^3 mask; output identical (verified across idx/not_beyond/axis/inclusion). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…mass pass The default _crop path looped over every label doing extract_label(i) + compute_crop + scipy center_of_mass. np_center_of_mass (cc3d) returns every label's centroid in a single pass. ~5x faster at 8-16 labels and ~9x at 20-40 labels; output bit-identical (verified 379/379 points exact to the rounded decimal). The non-_crop fallback is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

POI_Global construction (to_global) and to_other applied the affine transform one point at a time in a Python loop. Added vectorized local_to_global_arr (POI) and global_to_local_arr (Has_Grid) that transform an (N,3) array in a single matmul, and use them in those loops. ~7-8x faster (100-400 points); output bit-identical (verified vs per-point, with/without itk_coords). to_other keeps the per-point path when verbose=True to preserve its logging. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

flatten-mode filtering did candidates.copy() then list.remove() per dropped file (each remove is O(n) -> O(n^2) overall). Replaced with a single list comprehension; ~48x faster filtering 2000 candidates. The dict-mode branches likewise drop the throwaway dict copy()+pop() for a dict comprehension. Output identical (verified across flatten/dict x keys for both filter methods). The comprehension also removes by identity, avoiding list.remove's first-equal removal quirk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

_get_mesh called from_segmentation_nii(extract_label(u)) for every label, and that reorients + rescales (resamples) the image each time. Reorient/rescale commute with extract_label for nearest-neighbour segmentation resampling, so the image is now transformed once before the loop. ~5x (12 labels) to ~7x (25 labels) faster on the transform; the per-label marching-cubes meshes are bit-identical (verified arrays and mesh vertices for rescale_to_iso True/False). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

This PR focuses on performance optimizations across segmentation/label-array utilities, POI coordinate transforms, mesh preview generation, and BIDS candidate filtering, and adds a set of speedtest scripts to benchmark the proposed improvements.

Changes:

Introduces faster label/segmentation primitives (np_isin LUT path, np_volume heuristic, faster np_center_of_mass/np_bounding_boxes, 3D-specialized np_bbox_binary, and reduced-copy np_dilate_msk inner loop).
Vectorizes POI coordinate conversions and accelerates centroid computation by using a single cc3d statistics pass.
Optimizes higher-level workflows (mesh preview label loop, NII label operations, BIDS filter loops) and adds multiple benchmarking scripts under tests/speedtests/.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
TPTBox/tests/speedtests/speedtest_volume.py	New benchmark for `np_volume` implementations across label-count regimes.
TPTBox/tests/speedtests/speedtest_poi_to_global.py	New benchmark for batched POI local→global conversion.
TPTBox/tests/speedtests/speedtest_poi_calc_centroids.py	New benchmark for centroid computation approaches.
TPTBox/tests/speedtests/speedtest_nii_truncate_masks.py	New benchmark for mask extraction optimization in truncation logic.
TPTBox/tests/speedtests/speedtest_nii_remove_labels.py	New benchmark for `remove_labels` implementations (loop vs map/isin).
TPTBox/tests/speedtests/speedtest_nii_map_labels.py	New benchmark for avoiding `np_unique` scans when `verbose=False`.
TPTBox/tests/speedtests/speedtest_nii_extract_label_keep.py	New benchmark for `extract_label(..., keep_label=True)` optimization.
TPTBox/tests/speedtests/speedtest_mesh_preview_hoist.py	New benchmark for hoisting reorient/rescale outside per-label mesh loop.
TPTBox/tests/speedtests/speedtest_isin_lut.py	New benchmark comparing `np.isin` modes vs explicit LUT.
TPTBox/tests/speedtests/speedtest_dilate_vectorized.py	New benchmark for reduced-copy dilation inner loop (`out == i`).
TPTBox/tests/speedtests/speedtest_center_of_mass.py	New benchmark for direct voxel-count filtering in cc3d stats postprocessing.
TPTBox/tests/speedtests/speedtest_bids_filter.py	New benchmark for O(n) list comprehension filter vs O(n²) remove loop.
TPTBox/tests/speedtests/speedtest_bbox_binary.py	New benchmark for 3D `np_bbox_binary` 2-pass specialization.
TPTBox/mesh3D/html_preview.py	Hoists reorient/rescale once for per-label mesh generation.
TPTBox/core/poi.py	Adds `local_to_global_arr` and speeds up `calc_centroids` (cc3d-based path).
TPTBox/core/poi_fun/poi_global.py	Uses batched affine/inverse-affine conversions when not verbose.
TPTBox/core/np_utils.py	Adds `np_isin`; updates multiple utilities to use it; optimizes volume/COM/bbox/dilate/bbox_binary.
TPTBox/core/nii_wrapper.py	Uses `np_isin` in truncation/extract-label; avoids verbose-only scans; speeds `remove_labels` via `np_map_labels`.
TPTBox/core/nii_poi_abstract.py	Adds `global_to_local_arr` vectorized conversion.
TPTBox/core/bids_files.py	Replaces copy+remove loops with comprehensions (flatten and dict modes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    else:
        arrc = arr
        if labels is not None:
            arrc = arrc.copy()
-            arrc[np.isin(arr_bin, labels, invert=True)] = 0
+            arrc[np_isin(arr_bin, labels, invert=True)] = 0


robert-graf · 2026-06-12T06:49:07Z

+
+    eq = lambda x, y: x == y  # noqa: E731
+
+    for n_labels in (100, 400):


How much time does this save? Usually, poi resampling is negligible fast.

robert-graf · 2026-06-12T06:53:34Z

Fix the Copilete and my int comment. Rest LGTM

Hendrik-code and others added 13 commits June 10, 2026 16:19

Hendrik-code requested a review from robert-graf June 11, 2026 18:24

Hendrik-code self-assigned this Jun 11, 2026

Copilot AI review requested due to automatic review settings June 11, 2026 18:24

Hendrik-code added the speedimprove Changes that improve speed of code execution label Jun 11, 2026

Copilot started reviewing on behalf of Hendrik-code June 11, 2026 18:24 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread TPTBox/core/np_utils.py

Comment on lines 471 to +475

else:

arrc = arr

if labels is not None:

arrc = arrc.copy()

arrc[np.isin(arr_bin, labels, invert=True)] = 0

arrc[np_isin(arr_bin, labels, invert=True)] = 0

Comment thread TPTBox/core/np_utils.py

robert-graf reviewed Jun 12, 2026

View reviewed changes

Comment thread TPTBox/core/poi.py Outdated

robert-graf reviewed Jun 12, 2026

View reviewed changes

copilot and robert address

a47f632

Hendrik-code merged commit f81d7d0 into main Jun 29, 2026
5 checks passed

Hendrik-code deleted the optimization branch June 29, 2026 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization#111

Optimization#111
Hendrik-code merged 14 commits into
mainfrom
optimization

Hendrik-code commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

robert-graf Jun 12, 2026

Uh oh!

robert-graf commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		eq = lambda x, y: x == y # noqa: E731

		for n_labels in (100, 400):

Conversation

Hendrik-code commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

robert-graf Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

robert-graf commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants